Data Summary
income <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/income_per_person.csv")
life <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/life_expectancy_years.csv")
# Reshape data set such that there are only three columns (Geo, Year, & Income)
new_income <- pivot_longer(income, cols = -geo, names_to = "year", values_to = "income")
new_life <- pivot_longer(life, cols = -geo, names_to = "year", values_to = "life.expectancy")
## Create new data set
LifeExpIncom <- merge(new_life, new_income, by = c("geo", "year"))
## Read in More Data
country <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/countries_total.csv")
pop <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/population_total.csv")
new_pop <- pivot_longer(pop, cols = -geo, names_to = "year", values_to = "population")
## Merge LifeExpIncom with Country
merged <- merge(LifeExpIncom, country, by.x = "geo", by.y = "name", all.x = TRUE)
## Merge Population with Merged Data
fin_data <- merge(new_pop, merged, by = c("geo", "year"), all.x = TRUE)
## Get Data for Year 2000
final_data <- subset(fin_data, year =="X2000")
We first read in two datasets called “income” and “life,” which
represent income and life expectancy values over many years. “Income”
has 193 observations with 220 total variables, while “Life” has 187
observations and 220 total variables. Next, we reshape both datasets to
have only three columns: Geo, Year, and Income or Life Expectancy. We
then merge these reshaped sets into a dataset called “LifeExpIncome,”
which now contains Geo, Year, Income, and Life Expectancy (40953
observations and 4 variables). Next, we read in two more datasets:
“country” (240 observations and 11 variables) and “pop” (195
observations and 220 variables), representing country and population
data, respectively. We reshape “pop” to align with “LifeExpIncome” and
“Country,” which already have Year transformed into a single column.
After this, we merge “LifeExpIncome” with “Country” and then merge this
newly combined set with the reshaped “pop” set, creating a dataset
called “fin_data” (42705 observations and 15 variables). Finally, we
subset the data to focus only on data from the year 2000, resulting in
our “final_data” set (195 observations and 15 variables):
GGPlot
The scatter plot below shows the relationship between income, life
expectancy, and population size across different regions in the year
2000. Each point represents a country, with the size of the points
corresponding to the population size of that specific region. The
countries are color-coded for better visualization.
scatter_pop <- ggplot(final_data, aes(x = life.expectancy, y = income, color = region, size = population)) +
geom_point() +
labs(title = "Life Expectancy vs. Income per Region (2000)",
x = "Life Expectancy",
y = "Income",
size = "Population",
color = "Region")
scatter_pop
From the plot, we observe a slightly positive correlation between income
and life expectancy. It indicates that countries with higher incomes are
likely to have longer life expectancies. Additionally, countries in the
Americas and Asia tend to have larger populations, as indicated by the
larger point sizes. This also suggests that countries with higher
populations might have longer life expectancies. European countries
appear to have the longest life expectancies, with most of their points
on the far right side of the graph, although their populations are not
as large as those of other regions. Next, we subset the data to focus on
the year 2015, resulting in our “final_data” set (195 observations and
15 variables). Now, let’s examine the overall summary statistics for the
dataset “fin_data,” which includes data from all years, not just
2015.
## Get Data for Year 2015
final_data <- subset(fin_data, year =="X2015")
Plotly
The plot below shows the relationship between income, life
expectancy, and population size across different regions over several
years. Each point represents a country, with the size of the points
corresponding to the population size of that specific country. The
countries are color-coded by region for better visualization. To make
the plot more visually appealing, we’ve applied a transformation to the
population size using a logarithmic function. This transformation
compresses the range of population sizes, reducing the size of the
points and making the plot clearer and easier to interpret.
Additionally, the x-axis uses a logarithmic scale to better visualize
the wide range of income values. The plot is animated to show how these
relationships change over time, providing a dynamic view of global
trends in income, life expectancy, and population:
pal.IBM <- c("#332288", "#117733", "#0072B2","#D55E00", "#882255")
pal.IBM <- setNames(pal.IBM, c("Asia", "Europe", "Africa", "Americas", "Oceania"))
# Ensure no NA values in the region column
final_data <- final_data %>%
filter(!is.na(region)) # Remove rows with NA in the region column
# Filter data to remove NA values and convert year to numeric
final_data$year <- as.numeric(gsub("X", "", final_data$year))
final_data <- final_data %>%
filter(!is.na(life.expectancy) & !is.na(income) & !is.na(population))
fig <- final_data %>%
plot_ly(
x = ~income,
y = ~life.expectancy,
size = ~(2*log(population)-11)^2,
color = ~region,
colors = pal.IBM, # custom colors
frame = ~year, # the time variable to
text = ~paste("Country:", geo,
"<br>Region:", region,
"<br>Year:", year,
"<br>Life Expectancy:", life.expectancy,
"<br>Population:", population,
"<br>Income per Person:", income),
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
)
fig <- fig %>% layout(
xaxis = list(
type = "log"
),
title = "Income vs. Life Expectancy Over Time",
xaxis = list(title = "Income per Person (Log Scale)"),
yaxis = list(title = "Life Expectancy")
)
fig
The x-axis represents the income levels for each country, with higher
incomes positioned further to the right. The y-axis represents life
expectancy, with higher life expectancies positioned higher on the axis.
From the plot, we can observe that some countries dominate the scatter
plot due to their larger population sizes and higher incomes. This
visualization allows us to analyze whether countries with higher incomes
generally have longer life expectancies. Additionally, we can examine
whether there is a correlation between population size and income
levels, helping to identify trends and patterns in the data.
In the animated plot, each frame corresponds to a different year,
showing how the relationship between income, life expectancy, and
population size evolves over time. The size of each point is determined
by the population of the country, with larger points indicating larger
populations. The color of the points indicates the region to which the
country belongs, allowing us to see regional trends and differences more
clearly. By observing the animation, we can identify how economic and
health outcomes have changed across different regions and time periods,
providing insights into global development patterns.
---
title: "GGPlot & Plotly"
author: "Edward Coleman"
date: "02-22-2024"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    fig_width: 6
    number_sections: yes
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: true
    theme: readable
    fig_height: 4
---

<style type="text/css">

div#TOC li {
    list-style:none;
    background-color:lightgray;
    background-image:none;
    background-repeat:none;
    background-position:0;
    font-family: Arial, Helvetica, sans-serif;
    color: #780c0c;
}

/* mouse over link */
div#TOC a:hover {
  color: red;
}

/* unvisited link */
div#TOC a:link {
  color: blue;
}

h1.title {
  font-size: 24px;
  color: Darkblue;
  text-align: center;
  font-family: Arial, Helvetica, sans-serif;
  font-variant-caps: normal;
}
h4.author { 
  font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}
h4.date { 
  font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 {
  font-size: 22px;
  font-family: "Times New Roman", Times, serif;
  color: darkred;
  text-align: center;
}
h2 { 
  font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: navy;
  text-align: left;
}

h3 { 
  font-size: 15px;
  font-family: "Times New Roman", Times, serif;
  color: navy;
  text-align: left;
}

h4 { 
  font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: darkred;
  text-align: left;
}

/* unvisited link */
a:link {
  color: green;
}

/* visited link */
a:visited {
  color: green;
}

/* mouse over link */
a:hover {
  color: red;
}

/* selected link */
a:active {
  color: yellow;
}

</style>

```{r setup, include=FALSE, comment=NA}
options(repos = list(CRAN="http://cran.rstudio.com/"))
if (!require("tidyverse")) {
   install.packages("tidyverse")
   library(tidyverse)
}
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("cowplot")) {
   install.packages("cowplot")
   library(cowplot)
}
if (!require("latex2exp")) {
   install.packages("latex2exp")
   library(latex2exp)
}
if (!require("plotly")) {
   install.packages("plotly")
   library(plotly)
}
if (!require("gapminder")) {
   install.packages("gapminder")
   library(gapminder)
}
if (!require("png")) {
    install.packages("png")    
    library("png")
}
if (!require("RCurl")) {
    install.packages("RCurl")    
    library("RCurl")
}
if (!require("colourpicker")) {
    install.packages("colourpicker")              
    library("colourpicker")
}
if (!require("gganimate")) {
    install.packages("gganimate")              
    library("gganimate")
}
if (!require("gifski")) {
    install.packages("gifski")              
    library("gifski")
}
if (!require("magick")) {
    install.packages("magick")              
    library("magick")
}
if (!require("grDevices")) {
    install.packages("grDevices")              
    library("grDevices")
}
if (!require("jpeg")) {
    install.packages("jpeg")              
    library("jpeg")
}
if (!require("ggridges")) {
    install.packages("ggridges")              
    library("ggridges")
}
if (!require("plyr")) {
    install.packages("plyr")              
    library("plyr")
}
if (!require("ggiraph")) {
    install.packages("ggiraph")              
    library("ggiraph")
}
if (!require("highcharter")) {
    install.packages("highcharter")              
    library("highcharter")
}
if (!require("forecast")) {
    install.packages("forecast")              
    library("forecast")
}
if (!require("leaflet")) {
    install.packages("leaflet")              
    library("leaflet")
}
if (!require("sf")) {
    install.packages("sf")              
    library("sf")
}
if (!require("Stat2Data")) {
   install.packages("Stat2Data")
   library(Stat2Data)
}

knitr::opts_chunk$set(
  echo = TRUE,       
  warning = FALSE,   
  result = TRUE,   
  message = FALSE,
  comment = NA
)

library(colorspace)
library(dplyr)
library(tidyverse)
library(ggforce)
library(ggridges)
library(treemapify)
library(forcats)
library(statebins)
library(sf)
library(cowplot)

options(digits = 3)
knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  cache = FALSE,
  fig.align = 'center',
  fig.width = 6,
  fig.asp = 0.618,
  fig.show = "hold"
)
options(dplyr.print_min = 6, dplyr.print_max = 6)
```

# Data Summary
```{r}
income <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/income_per_person.csv")

life <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/life_expectancy_years.csv")

# Reshape data set such that there are only three columns (Geo, Year, & Income)
new_income <- pivot_longer(income, cols = -geo, names_to = "year", values_to = "income")

new_life <- pivot_longer(life, cols = -geo, names_to = "year", values_to = "life.expectancy")

## Create new data set
LifeExpIncom <- merge(new_life, new_income, by = c("geo", "year"))

## Read in More Data
country <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/countries_total.csv")

pop <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/population_total.csv")

new_pop <- pivot_longer(pop, cols = -geo, names_to = "year", values_to = "population")

## Merge LifeExpIncom with Country
merged <- merge(LifeExpIncom, country, by.x = "geo", by.y = "name", all.x = TRUE)

## Merge Population with Merged Data
fin_data <- merge(new_pop, merged, by = c("geo", "year"), all.x = TRUE)

## Get Data for Year 2000
final_data <- subset(fin_data, year =="X2000")
```
  We first read in two datasets called "income" and "life," which represent income and life expectancy values over many years. "Income" has 193 observations with 220 total variables, while "Life" has 187 observations and 220 total variables. Next, we reshape both datasets to have only three columns: Geo, Year, and Income or Life Expectancy. We then merge these reshaped sets into a dataset called "LifeExpIncome," which now contains Geo, Year, Income, and Life Expectancy (40953 observations and 4 variables). 
Next, we read in two more datasets: "country" (240 observations and 11 variables) and "pop" (195 observations and 220 variables), representing country and population data, respectively. We reshape "pop" to align with "LifeExpIncome" and "Country," which already have Year transformed into a single column. After this, we merge "LifeExpIncome" with "Country" and then merge this newly combined set with the reshaped "pop" set, creating a dataset called "fin_data" (42705 observations and 15 variables). Finally, we subset the data to focus only on data from the year 2000, resulting in our "final_data" set (195 observations and 15 variables):

# GGPlot
  The scatter plot below shows the relationship between income, life expectancy, and population size across different regions in the year 2000. Each point represents a country, with the size of the points corresponding to the population size of that specific region. The countries are color-coded for better visualization.
```{r}
scatter_pop <- ggplot(final_data, aes(x = life.expectancy, y = income, color = region, size = population)) +
  geom_point() +
  labs(title = "Life Expectancy vs. Income per Region (2000)",
       x = "Life Expectancy",
       y = "Income",
       size = "Population",
       color = "Region")
scatter_pop
```
  From the plot, we observe a slightly positive correlation between income and life expectancy. It indicates that countries with higher incomes are likely to have longer life expectancies. Additionally, countries in the Americas and Asia tend to have larger populations, as indicated by the larger point sizes. This also suggests that countries with higher populations might have longer life expectancies. European countries appear to have the longest life expectancies, with most of their points on the far right side of the graph, although their populations are not as large as those of other regions.
	Next, we subset the data to focus on the year 2015, resulting in our "final_data" set (195 observations and 15 variables). Now, let’s examine the overall summary statistics for the dataset "fin_data," which includes data from all years, not just 2015.
```{r, comment=NA}
## Get Data for Year 2015
final_data <- subset(fin_data, year =="X2015")
```

# Plotly
  The plot below shows the relationship between income, life expectancy, and population size across different regions over several years. Each point represents a country, with the size of the points corresponding to the population size of that specific country. The countries are color-coded by region for better visualization. To make the plot more visually appealing, we've applied a transformation to the population size using a logarithmic function. This transformation compresses the range of population sizes, reducing the size of the points and making the plot clearer and easier to interpret. Additionally, the x-axis uses a logarithmic scale to better visualize the wide range of income values. The plot is animated to show how these relationships change over time, providing a dynamic view of global trends in income, life expectancy, and population:
```{r, comment=NA}
pal.IBM <- c("#332288", "#117733", "#0072B2","#D55E00", "#882255")
pal.IBM <- setNames(pal.IBM, c("Asia", "Europe", "Africa", "Americas", "Oceania"))

# Ensure no NA values in the region column
final_data <- final_data %>%
  filter(!is.na(region))  # Remove rows with NA in the region column

# Filter data to remove NA values and convert year to numeric
final_data$year <- as.numeric(gsub("X", "", final_data$year))
final_data <- final_data %>%
  filter(!is.na(life.expectancy) & !is.na(income) & !is.na(population))

fig <- final_data %>%
  plot_ly(
    x = ~income, 
    y = ~life.expectancy, 
    size = ~(2*log(population)-11)^2,
    color = ~region, 
    colors = pal.IBM,   # custom colors
    frame = ~year,      # the time variable to
    text = ~paste("Country:", geo,
                  "<br>Region:", region,
                  "<br>Year:", year,
                  "<br>Life Expectancy:", life.expectancy,
                  "<br>Population:", population,
                  "<br>Income per Person:", income),
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  )
fig <- fig %>% layout(
    xaxis = list(
      type = "log"
    ),
    title = "Income vs. Life Expectancy Over Time",
    xaxis = list(title = "Income per Person (Log Scale)"),
    yaxis = list(title = "Life Expectancy")
  )

fig
```

The x-axis represents the income levels for each country, with higher incomes positioned further to the right. The y-axis represents life expectancy, with higher life expectancies positioned higher on the axis. From the plot, we can observe that some countries dominate the scatter plot due to their larger population sizes and higher incomes. This visualization allows us to analyze whether countries with higher incomes generally have longer life expectancies. Additionally, we can examine whether there is a correlation between population size and income levels, helping to identify trends and patterns in the data.

In the animated plot, each frame corresponds to a different year, showing how the relationship between income, life expectancy, and population size evolves over time. The size of each point is determined by the population of the country, with larger points indicating larger populations. The color of the points indicates the region to which the country belongs, allowing us to see regional trends and differences more clearly. By observing the animation, we can identify how economic and health outcomes have changed across different regions and time periods, providing insights into global development patterns.












